Abstract: It is common to have an unbalanced class distribution in many classification problems. The class imbalance problem is even more severe when the dimensionality is high. One commonly used strategy to improve the classification performance is feature selection. Feature selection is a technique to select a subset of relevant features that allow a classifier to reach optimal performance. Most of the approaches for feature selection methods for imbalanced datasets mainly focus on an imbalanced dataset with two classes and does not work significantly well with a multiclass imbalanced dataset. In this paper, we propose a filter feature selection algorithm called MINDEX_IB, for unbalanced data sets. MINDEX_IB is a filter approach based measure. The proposed measure focuses on efficient partitioning of the attribute domain. Here, partitioning is done via micro-clustering i.e. the process of making micro-clusters. MINDEX_IB outperforms other feature selection algorithms in terms of number of features selected, accuracy and also in terms of performance measures for the imbalanced dataset such as F-measure and AUC evaluation measure.
Keywords: Feature selection, Imbalanced dataset, Classification, Filter based approach.